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FOREWORD 


This  research  was  conducted  in  support  of  Project  ZF63-522-002-03.40,  Techniques 
for  the  Measurement  of  Job  Performance.  The  statistical  procedure  described  in  this 
report  was  used  successfully  to  improve  the  overall  predictive  validity  of  the  scores  on 
the  Basic  Mechanical  Procedures  Test,  a  criterion-referenced,  diagnostic  test  for  boiler 
technicians. 

Special  appreciation  is  expressed  to  V.  A.  Reisenleiter,  Army  Research  Institute,  who 
brought  this  statistical  procedure  to  our  attention. 


RICHARD  C.  SORENSON 
Director  of  Programs 


SUMMARY 


Problem 

'Vhe  development  of  a  criterion-referenced,  diagnostic  test  for  boiler  technicians 
(BTs)  was  a  necessary  part  of  the  Personnel  Readiness  Training  Program,  a  diagnostic 
testing/shipboard  training  program.  In  the  absence  of  a  well-defined  methodology  for  the 
optimal  weighting  of  individual  item  scores  on  criterion-referenced  tests,  a  systematic 
and  practical  procedure  leading  to  the  improvement  of  the  predictive  validity  of  scores  on 
this  diagnostic  test  was  needed. 

Purpose 

The  purpose  of  this  effort  was  to  determine  whether  an  application  of  discriminant 
function  emalysis  could  significantly  improve  the  overall  predictive  validity  of  the  scores 
on  a  diagnostic  criterion-referenced  test.  In  this  statistical  procedure,  the  examinee's 
score  is  based  on  the  sum  of  the  individual  item  discriminant  weights  as  opposed  to  one 
for  a  correct  response  and  zero  for  an  incorrect  one. 

Method 

The  test  employed  in  this  research  was  the  criterion-referenced,  Basic  Mechanical 
Procedures  (BMP)  test,  which  was  developed  by  NAVPERSRANDCEN  to  diagnose  individ¬ 
ual  deficiencies  within  the  BT  rating..  The  sample  consisted  of  200  BTs  assigned  to  the 
shore-based  Propulsion  Engineering  Schpol,  Service  School  Command,  Great  Lakes. 
Half— the  pre-instruction  group--were  entering  the  modularized,  self-paced  curriculum, 
and  half --the  post-instruction  group- -had  completed  it. 

Using  the  discriminant  function  procedure,  an  optimal  weighting  strategy  of  the 
individual  item  scores  was  derived  for  12  of  the  14  modules  of  the  BMP  test.  This 
discriminant  scoring  procedure  was  then  compared  to  the  traditional  number-correct 
procedure  in  terms  of  the  predictive  validity  levels  within  each  module.  The  validity  level 
was  estimated  by  comparing  actual  group  membership  (pre-  vs.  post-instruction)  with 
group  membership  aissigned  on  the  basis  of  the  discriminant  and  number-correct  scores. 


Results 


1.  Most  of  the  modules  under  both  scoring  methods  demonstrated  excellent 
classification  ability  with  each  of  the  percent  agreements  significant  at  conventional 
levels. 

2.  A  significant  improvement  in  the  overall  validity  of  the  discriminant  scores  as 
compared  to  number  correct  was  noted. 

3.  Differences  between  pairs  of  validity  coefficients  in  four  of  the  five  modules 
having  the  poorest  classification  percentages  were  significant  in  favor  of  the  discriminant 
scores. 

4.  Scores  resulting  from  the  discriminant  weights  improved  the  ability  of  this 
criterion-referenced  test  to  accurately  classify  students  as  members  of  either  the  pre-  or 
post-instruction  group. 

Conclusions 

The  discriminzmt  function: 

1.  Would  appear  to  be  sound  by  analogy  to  statistical  theory  and  would  utilize  well 
known  item  statistics. 

2.  Allows  the  items  to  be  scaled  along  a  discrimination  continuum  with  meaningful 
end  points. 

3.  Provides  an  index  of  the  usefulness  of  the  items  for  discriminating  between  pre- 
and  post-instruction  member's  test  scores.  Such  a  determination  of  item  weights  would  be 
easy  to  program. 

4.  Should  prove  especially  helpful  in  improving  the  overall  validity  of  tests 
including  items  that  are  not  as  valid  as  those  included  in  the  BMP  test. 

Recommendation 

The  discriminant  function  should  be  considered  as  an  alternative  procedure  to  the 
more  conventional  "number-correct"  procedure  for  determining  item  scores. 
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INTRODUCTION 


Problem  and  Background 

Criterion-referenced  measurement  has  been  one  of  the  most  provocative  ideas  to 
influence  educational  measurement  theory  amd  practice  in  recent  years.  Although  the 
large  number  of  articles  that  have  been  published  in  this  area  ol  measurement  describe  a 
variety  of  criterion-referenced  tests,  it  is  still  difficult  to  find  the  precise  definition  of  a 
criterion-referenced  test.  Typically,  criterion-referenced  measures  are  used  to  provide 
information  on  the  status  of  an  individual's  knowledge  and  skill  with  respect  to  some 
criterion  or  standard  of  performance  (Popham  &  Husek,  1969).  Depending  on  how  the  test 
results  are  used,  however,  the  same  test  can  be  either  norm-  or  criterion-referenced 
(Hambleton  &  Novick,  1973).  For  example,  using  the  results  of  a  typing  test  to  choose  the 
fastest  typist  or  assign  the  highest  typing  grade  represents  a  norm-referenced  procedure. 
Using  the  results  of  the  same  typing  test  to  decide  that  more  practice  or  training  is 
needed  when  a  cutoff  score  is  not  met  represents  a  criterion-referenced  procedure.  The 
latter  application  contains  the  essence  of  the  definition  of  a  criterion-referenced  test 
that  will  be  used  in  this  note.  That  is,  criterion-referenced  measurement  empheisizes  the 
description  of  the  absolute  rather  than  relative  level  of  performance  with  respect  to  a 
well-defined  behavior  domain. 

In  the  context  of  the  above  definition  of  criterion-referenced  measurement,  test 
validity  has  been  approached  in  the  framework  of  both  a  modified  classical  theory 
(Livingston,  1972)  and  Bayesian  statistics  (Hambleton  &  Novick,  1973).  In  each  instance, 
classical  procedures  for  estimating  variability  of  criterion-referenced  test  scores  yielded 
spuriously  low  validity  estimates  based  on  correlational  procedures.  In  terms  of  the 
purpose  of  these  tests,  which  is  to  determine  the  degree  to  which  any  student  has 
mastered  a  set  of  objectives,  either  face  or  content  validity  measures  are  considered 
e^>propriate  solutions  to  the  validation  problem.  The  content  validity  approach,  which  is 


better  suited  for  such  tests,  can  be  determined,  according  to  Popham  and  Husek  (1969),  by 
"a  carefully  made  judgment,  based  on  the  test's  apparent  relevance  to  the  behaviors 
legitimately  inferable  from  those  delimited  by  the  criterion."  If  a  technique  such  as 
advocated  by  Bormuth  (1970)  for  defining  content  domains  and  item  generation  rules  is 
followed,  content  validity  is  necessarily  guaranteed. 

Inherent  in  the  qualitative  approaches  to  the  measurement  of  test  validity  is  the 
inability  to  quantify  the  degree  to  which  criterion-referenced  tests  lead  to  accurate 
diagnostic  decisions.  Fortunately,  Panell  and  Laabs  (1979)  developed  an  alternative 
procedure  that  does  provide  a  useful,  quantitative  estimate  of  the  predictive  validity  level 
of  criterion-referenced  test  items.  In  this  procedure,  students  in  a  cross-validation 
sample  are  classified  as  either  mastery  or  nonmastery  achievers,  according  to  how  their 
performance  compared  to  a  predetermined  cutoff  score.  This  classification  is  then 
compared  with  actual  group  membership,  and  the  percent  agreement  between  predicted 
and  actual  group  membership  is  obtained.  Once  an  estimate  is  derived  using  this 
procedure,  a  search  for  alternative  scoring  methods  that  might  further  enhance  the 
reliability  and  validity  of  criterion-referenced  items  can  be  made. 

For  more  than  three  decades,  researchers  have  investigated  the  effects  of  scoring 
methods  on  reliability  and  validity  of  norm-referenced  test  items.  A  small  sample  of  the 
many  variations  appearing  in  the  literature  include  elimination  scoring  (Coombs, 
Milholland,  &  Womer,  1956),  Guttman  weighting  (Guttman,  1941),  judged  confidence 
weighting  (Patnaik  &  Traub,  1973),  confidence  weighting  (Shuford,  Albert,  <5c  Massengill, 
1966),  and  the  conventional  correction  for  guessing  (Lord  &  Novick,  1968).  In  each  of 
these  methods,  an  attempt  is  made  to  increase  the  reliability  and  validity  of  inferences 
made  from  scores  by  capitalizing  on  the  different  levels  of  knowledge  reflected  in  both 
the  correct  and  incorrect  responses.  For  example,  in  the  judged  confidence  method,  the 
responses  to  a  multiple-choice  item  are  differentially  weighted  according  to  their 
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prejudged  degree  of  correctness.  In  the  elimination  method,  the  examinee  is  instructed  to 
indicate  which  of  the  options  he  can  identify  as  incorrect,  and  his  score  is  determined  by 
the  number  of  distractors  correctly  identified.  In  confidence  weighting,  the  response 
alternative  the  examinee  would  have  selected  is  inferred  from  the  manner  in  which  he  has 
assigned  personal  probabilities  to  the  item-choices.  The  question  of  whether  the  use  of 
scoring  procedures  other  than  number-correct  does  lead  to  more  valid  tests,  despite  the 
intuitive  appeal  of  differential  response  weighting,  has  not  been  clearly  answered.  In 
point  of  fact,  Stanley  and  Wang  (1970),  in  their  comprehensive  review  of  differential 
response  weighting,  report  that  studies  of  this  kind  have  been  far  from  conclusive  in 
demonstrating  consistent  gains  in  desired  psychometric  properties.  In  addition,  since 
these  methods  do  not  fit  within  the  criterion-referenced  measurement  framework,  they 
should  not  be  used  for  improvement  of  such  tests. 

An  alternative  procedure  to  the  differential  weighting  of  response  options  used  with 
tiorm-referenced  tests,  one  that  is  acceptable  for  criterion-referenced  measurement 
usage,  involves  optimal  weighting  of  the  individual  item  scores  themselves  in  an  effort  to 
maximize  the  overall  predictive  validity  of  the  test.  A  possible  methodological  candidate 
capable  of  providing  this  type  of  optimal  weighting  strategy  is  the  plug-in  discriminant 
function  analysis  for  discrete  binary  items  (Elvers,  1977).*  In  this  statistical  procedure, 
which  was  brought  to  our  attention  by  V.  A.  Reisenleiter,  Army  Research  Institute,  the 
examinee's  score  is  based  on  the  sum  of  the  individual  item  discriminant  weights  as 
opposed  to  one  for  a  correct  response  and  zero  for  an  incorrect  one. 


‘The  plug-in  disciminant  function  was  selected  rather  than  a  more,  traditional 
approach,  such  as  Fisher's  (1936)  linear  discriminant  function,  because  its  statistical 
assumptions  fit  nicely  within  the  framework  of  the  problem  addressed  in  this  study,  and 
the  mathematical  computations  necessary  to  apply  the  plug-in  technique  are  much  shorter 
and  simpler  than  those  needed  for  more  traditional  approaches.  While  Fisher's  technique, 
which  is  based  on  log-likelihood  ratios,  provides  a  useful  tool  for  discriminating  between 
populations,  it  may  be  quite  unsuitable  for  allocating  a  particular  subject  to  one  of  two 
populations  when  the  underlying  distribution  is  far  from  multivariate  normal. 
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The  purpose  of  this  effort  was  to  determine  whether  or  not  an  application  of  the 
plug-in  discriminant  function  could  significantly  improve  the  overall  predictive  validity  of 


the  scores  on  a  diagnostic  criterion-referenced  test. 


METHOD 


Test 

The  test  employed  in  this  research  was  the  criterion-referenced,  Basic  Mechanical 
Procedures  (BMP)  test,  which  was  developed  at  the  Navy  Personnel  Research  and 
Development  Center  to  diagnose  individucil  deficiencies  within  the  boiler  technician  rating 
(BT)  (Laabs,  Harris,  &  Pickering,  1977).  The  BMP  is  keyed  to  the  14  instructional  modules 
listed  in  Table  1,  and  has  a  prescribed  administration  time  of  90  minutes.  The  response 
vectors  and  validity  data  for  200  BTs  that  were  used  in  the  construction  and  validation  of 
the  BMP  test  were  available. 


Table  1 

Basic  Skills  and  Knowledges  Modules 


Module 

Title 

I 

Metal  Fasteners,  Hand  Tools 

2 

Pipes,  Tubings,  Fittings 

3 

Packing,  Gaskets,  Insulation 

4 

Valves 

5 

Bearings,  Lubrication 

6 

Pumps 

7 

Precision  Measurement  Instruments,  Technical 
Manueds 

8 

Heat  Properties,  Heat  Exchangers 

9 

Indicating  Devices 

10 

Turbines,  Couplings,  Gears 

11 

Strainers,  Purifiers 

12 

Low  Pressure  Air  System  and  Compressor 

13 

Oil  Pollution 

14 

Planned  Maintenance  System 

Sample 

The  sample  consisted  of  200  BTs  assigned  to  the  shore-based  Propulsion  Engineering 
School,  Service  School  Command,  Great  Lakes.  Half  of  them--the  pre-instruction 
group — were  entering  the  modularized,  self-paced  curriculum,  and  half--the  post¬ 
instruction  group- -had  completed  it.  For  each  group,  response  data  from  25  students 
randomly  selected  were  put  aside  for  cross-validation  purposes. 

Plug-in  Discriminant  Analysis 

In  this  procedure,  the  items  to  be  analyzed  are  assumed  to  be  independent  binary- 
random  variables  with  possible  values  of  1  and  0.  Moreover,  it  is  necessary  to  maintain  a 
minimum  of  10  subjects  for  every  item  included  in  this  particular  application  of  the 
discriminant  analysis  to  assure  stable  statistical  estimates  of  the  actual  or  population 
discriminant  weights.  Granting  these  assumptions  are  met  by  the  BMP  test  modules,  the 
model  for  the  plug-in  discriminant  function  is  computed  as  follows: 

2 

I  l-Tf 

D.j  =  2:  [X  Log  — (1-X  )Log  - \] 

i=l  ®  ^1  P  e  j  _  1 


Where  D 


=  The  discriminant  score  for  person  j  on  I  items. 


Log^  =  A  logarithm  with  e  as  a  base. 


Xjj  =  Response  of  individual  j  to  item  i. 

f  j  =  The  probability  of  an  examinee  from  group  1  cinswering  item  i 

correctly.  Tj  is  estimated,  in  this  study,  as  the  proportion  of 
correct  responses  within  the  pre-instruction  group  (1)  on  item  i. 
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2 

4'j  =  The  probability  of  an  examinee  from  group  2  answering  item  i 

2 

correctly,  y-  is  estimated,  in  this  study,  as  the  proportion  of 
correct  responses  within  the  post-instruction  group  (2)  on  item  i. 

To  illustrate  the  basic  mechanical  operation  of  this  formula,  consider  the  following 
numerical  example  for  a  single  item  module  (I  =  1),  where  there  are  five  students  in  each 
instruction  group.  Suppose  only  one  of  the  five  pre-instruction  members  answers  our 
hypothetical  one-item  test  correctly  (fj  =  .20),  compared  to  three  of  the  five  post- 
instruction  members  ('I'j  =  .60).  The  resulting  discriminant  function  would  appear  as 
follows: 

Dj,.Log^  _4. 

=  l.lOX.j  -  .69  (1  -  X.p. 

In  this  example,  the  plug-in  discriminant  assigns  a  positive  item  weight  (i.e.,  discriminant 
score)  to  a  correct  outcome  (X^^  =  1)  on  item  1  and  a  negative  item  weight  to  an  incorrect 
outcome  (X^j  =  0).  Thus,  an  examinee  who  answered  item  one  on  our  hypothetical  test 
correctly  would  receive  a  discriminant  score  of  1.10,  while  one  who  answered  item  one 
incorrectly  would  receive  a  discriminant  score  of  -.69.^ 

^In  the  event  that  equals  unity  or  zero,  the  natural  logarithm  expression  in  the 

discriminant  function  yields  em  indeterminate  solution  (i.e.,  the  logarithm  to  the  base  e  is 
undefined  at  0).  For  this  reason,  ihe  following  is  recommended: 

When  4'!^=  1  set  equal  to  ^ 

yH:  0  set  equal  to _ 1 

When  N|^  is  the  number  of  examinees  in  group  k.  This  prevents  the  assignment  of  infinity 
as  2U1  item  weight. 
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To  demonstrate  just  how  easy  it  is  to  compute  item  weights  in  this  manner,  a  simple 
FORTRAN  program,  appearing  in  Appendix  A,  was  created  to  generate  item  weights  and 
subsequent  discriminant  scores.  A  sequence  of  information  describing  the  necessary  input 
parameters  must  be  entered  as  part  of  the  program  before  any  data  analysis  can  begin. 
This  information  includes  (1)  the  total  number  of  subjects  and  items  to  be  processed,  (2) 
the  two  T parameter  estimates  for  each  item  of  the  test,  and  (3)  a  binary-scored  response 
vector  for  each  subject.  The  output  for  each  plug-in  discriminamt  analysis  includes  (1)  the 
output  identification  card  to  be  used  as  a  heading  to  your  output,  (2)  the  discriminant 
weights  for  each  item,  and  (3)  the  discriminant  scores  for  each  subject.  The  program  is 
presently  limited  to  include  a  maximum  of  200  subjects  and  80  items.  Of  course,  all  of 
the  variable  size  limitations  can  be  altered  by  increasing  array  sizes  within  the  program. 

The  FORTRAN  program,  using  the  150-item  response  sets  from  the  pre-  and  post¬ 
instruction  BMP  test  answer  sheets,  was  accessed  to  compute  the  discriminant 
coefficients  for  each  of  the  items  and  subsequent  discriminant  scores  for  the  examinees  in 
the  validation  group.  This  analysis  was  repeated  for  each  module  in  a  similar  manner. 
Estimating  Validity 

To  evaluate  the  validity  of  the  criterion-referenced  BMP  test  modules,  a  predictive 
evaluation  approach  was  taken  (see  Laabs  &  Panell,  1978,  or  Panell  &  Laabs,  1979),  This 
approach  to  the  problem  of  estimating  the  validity  of  criterion-referenced  tests  followed 
a  prescribed  set  of  procedures  that  includes  the  establishment  of  a  cutoff  score  for  each 
module  and  the  subsequent  use  of  this  criterion  to  classify  students  in  the  cross-validation 
sample.  Specifically,  if  an  examinee's  conventional  scores  fell  at  or  above  the  appropriate 
cutoff  score,  he  was  classified  as  a  post-instruction  group  member.  If  his  performance 
fell  below  the  criterion,  he  was  classified  as  a  pre-instruction  group  member.  This 
classification  was  compared  to  actual  group  membership  in  the  validation  sample  and  the 
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percentage  of  agreement  was  determined.  It  should  be  noted  that  this  concept  of  validity 
necessarily  assumes  that  the  instructional  material  to  which  the  test  is  being  keyed  is 
effective. 


The  estimates  of  the  validity  of  the  criterion-referenced  BMP  test  modules  using  the 
conventional  scoring  method  was  compared  to  that  using  the  discriminant  scoring  method. 
In  the  discrimincmt  scoring  method,  the  item  weights  within  each  module  were  computed 
using  the  developmental  sample  of  150  BTs,  with  the  cutoff  score  in  this  instance  equaling 
Logg  ii1/it2,  where  irl  and  n2  equal  the  proportion  of  pre-  and  post -instruction  members 
in  the  cross-validation  groups.  Since  the  proportion  of  pre-instruction  members  equaled 
the  proportion  of  post-instruction  members  in  the  validation  sample,  the  cutoff  criterion 
equaled  Log^l  or  zero  for  each  module.  Predictive  validity  was  therefore  estimated  by 
using  this  new  criterion  to  classify  the  examinees  in  the  cross-validation  sample  as 
illustrated  below. 

Assign  subject  j  to  the  pre-instruction  group  if  D^j  is  less  than  C  and  to  the  post¬ 
instruction  group  if  Djj  is  greater  than  or  equal  to  C,  where  the  cutoff  point  C  equals 
zero.*  This  classification  was  compared  to  actual  group  membership  in  the  cross- 
validation  sample,  and  the  percentage  of  agreement  was  determined. 


*The  cutoff  criterion  employed  in  the  discriminant  scoring  procedure  was  chosen  for 
use  to  minimize  the  total  probability  of  misclassifying  examinees  in  terms  of  group 
membership  (Click,  1973). 


RESULTS  AND  DISCUSSION 


Table  2  reports  the  estimated  validity  coefficients  for  module  subtests  predicted 
from  number-correct  and  discriminant  scores  (modules  4  and  6  did  not  meet  the  subject- 
to-item  ratio  requirement  for  discriminant  scores).  These  results  show  that  most  of  the 
subtests,  under  both  scoring  methods,  demonstrated  excellent  classification  ability.  Each 
of  the  percent  agreements  was  significant  at  conventional  levels,  as  determined  by  a  chi- 
square  test  with  Yates  correction  for  continuity  (Appendices  B  and  C  contain  the 
contingency  tables  and  cross-validation  data).  Although  the  uniformly  large  classification 
percentages  achieved  under  the  number-correct  scoring  method  indicated  little  room  for 
improvement  in  individual  subtest  validity,  an  inspection  of  Table  2  reveals  an  increase  in 
the  overall  validity  of  the  discriminant  scores  as  compared  to  the  number -correct  scores. 
An  analysis  was  performed  to  confirm  this  observation  by  comparing  the  validity 
coefficients  obtained  under  both  scoring  procedures  across  12  modules — all  but  modules  4 

and  6.  The  differences  between  validity  coefficients  for  the  12  modules  were  then 

I 

;  submitted  to  a  Wilcoxen-pairs  signed-rank  test,  which  showed  a  significant  improvement 

j  in  the  overall  test  validity  (Z  =  5.39,  p  <  .001).  Moreover,  the  differences  between  pairs 

of  validity  coefficients  in  the  5  modules  having  the  poorest  classification  percentages 
(Nos.  2,  3,  5,  8,  and  9)  were  analyzed,  in  a  post  hoc  fashion,  by  a  statistical  test  for 
correlated  proportions;  all  but  module  9  were  significant  at  conventional  levels. 

The  outcome  of  this  study  regarding  the  predictive  validity  of  number-correct  and 
discriminant  scores  on  the  BMP  test  modules  was  in  .the  expected  direction.  Scores 
resulting  from  the  discriminant  weights  improved  the  ability  of  this  criterion-referenced 
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Table  2 


Agreement  in  Classification  of  a  Cross-Validation  Sample 

N  =  50 


Module 

Title 

Number-Correct 

Percent 

Agreement 

Discriminant  Weights 
Percent 
Agreement 

1 

Metal  Fasteners,  Hand  Tools 

.88 

.88 

2 

Pipes,  Tubings,  Fittings 

.78 

.86 

3 

Packing,  Gaskets,  Insulation 

.68 

.82 

4 

Valves 

.86 

__a 

5 

Bearings,  Lubrication 

.74 

.78 

6 

Pumps 

.92 

a 

7 

Precision  Measurement  Instruments 

Technical  Manuals  .86 

.88 

8 

Heat  Properties,  Heat  Exchangers 

.72 

.78 

9 

Indicating  Devices 

.78 

.80 

10 

Turbines,  Couplings,  Gears 

.80 

.78 

11 

Strainers,  Purifiers 

.88 

.88 

12 

Low  Pressure  Air  System  and 
Compressor 

.88 

.90 

13 

Oil  Pollution 

.86 

.86 

14 

Planned  Maintenance  System 

.90 

.92 

®An  insufficient  ratio  of  subjects  to  items  existed  in  modules  4  and  6,  thus  violating  one 
of  the  basic  assumptions  of  the  discrete  discriminamt  model. 


test  to  accurately  classify  students  in  the  cross-validation  sample  as  members  of  either 
the  pre-  or  post-instruction  group.  It  is  particularly  important  to  note  that,  although  this 
test  had  a  very  high  validity  content  before  the  application  of  discriminant  weights,  the 
differences  between  the  classification  percentages  for  the  scoring  procedures  favored  the 
discriminant  application  in  all  but  one  of  the  modules.  A  small  negative  difference  was 
recorded  for  module  10.  Thus,  it  would  appear  that  an  application  of  the  "plug-in" 
discriminant  function  did  improve  the  overall  validity  of  this  criterion-referenced  test. 
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CONCLUSION 


The  results  of  this  effort  demonstrate  that  discriminant  coefficients  used  as  item 
weights  for  scoring  purposes  can  significantly  enhance  the  BMP  test's  capacity  to 
correctly  discriminate  between  those  students  who  need  additional  instruction  and  those 
who  do  not.  This  scoring  procedure  resulted  in  an  increase  in  predictive  validity  by 
maximizing  through  differential  item  weighting  the  differing  validity  levels  of  the 
individual  items.  The  only  limitation  of  this  procedure  is  the  10  to  1  subject-to-item  ratio 
that  must  exist  before  using  this  application  of  the  discriminant  model.  In  addition  to  the 
gains  in  psychometric  qualities  displayed  by  this  application  of  the  plug-in  discriminant 
function,  the  following  properties  should  also  be  included  as  assets  of  this  procedure: 

1.  The  discriminant  function  would  appear  to  be  sound  by  analogy  to  statistical 
theory  and  would  utilize  well  known  item  statistics. 

2.  The  discriminant  function  allows  the  items  to  be  scaled  along  a  discrimination 
continuum  with  meaningful  end  points. 

3.  The  discriminant  function  provides  an  index  of  the  usefulness  of  the  items  for 
discriminating  between  pre-  and  post-instruction  member’s  test  scores.  Such  a  determina¬ 
tion  of  item  weights  would  be  easy  to  program,  as  is  demonstrated  by  the  example 
appearing  in  Appendix  A. 

4.  The  discriminant  function  should  prove  especially  helpful  in  improving  the 
overall  validity  of  tests  whose  items  are  not  as  valid  to  begin  with  as  the  BMP  test. 

RECOMMENDATION 

In  view  of  these  properties,  the  plug-in  discriminant  functions  should  be  considered  as 
an  alternative  procedure  to  the  more  conventional  number-correct  procedure  for 
determining  item  scores.  This  grading  method  for  the  BMP  test,  coupled  with  an  adequate 
item  development  procedure  such  as  the  one  outlined  in  Laabs  and  Panell  (1978)  and 
Panell  and  Laabs  (1979),  provides  the  researcher  with  an  excellent  means  for  maximizing 
item  information  on  criterion-referenced  tests. 
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APPENDIX  A 

PROGRAM  FOR  PLUG-IN  DISCRIMINANT  FUNCTION  ANALYSIS 


A-0 


C  PH06RAM  LANGUAGE  ■  FORTRAN  IV 
C  RROGRAM  NRITTEN  FOR  UNIVAC  tllO 

I  • 

C  MAXIMUM  CARABILITIEI  are* 
e  UR  TO  200  EXAMINEES 

<  C  number  of  SB  READ  AS  NN  ON  DATA  RARAMETER  CARD, 
e  UR  TO  SO  ITEMS 

C  number  OF  ITEMS  READ  AS  II  ON  DATA  RARAMETER  CARD, 
e  number  OF  SB  OR  ITEMS  MAV  BE  INCREASED  IF  NECESSARY  BY  CHANG* 

C  XN6  DIMENSIONS. 

I  C  THIS  analysis  requires  that  the  RARAMETER  ESTIMATES  ARE  0<R8I<I 

!  C  and  the  SUBJECTS'  VECTOR  SCORES  ARE  GRADED  IN  A  DICHOTOMOUS 

i  C  MANNER(0*I). 

i  '  c 

C  SUBMIT  CAROS  IN  FOLLONINQ  QROERl 

C  RROGRAM  CAROS 
C 

C  YOUR  XOENTXFXCATION  CARO.  USE  ANY  OF  THE  SO  COLS.  FOR  XNFORMA* 
e  TION  THAT  YOU  MISH  TO  ARREAR  AS  A  HEAOInG  FOR  YOuR  OUTRUT. 

c 

C  THE  NEXT  CARO  IS  THE  DATA  RARAMETER  CARD  AND  SHOULD  SE  RUNCHED 
e  IN  THE  FOLLOMING  MANNERl 

e  COL.  1«S  THE  NUMBER  OF  SUBJECTSCUR  TO  200). 

C  COL.  0*0  the  number  or  ITEMSCUR  to  SO). 

I  C 

e  THE  next  CAROS  ARE  THE  RSI  RARAMETER  ESTIMATE  CArOS  ANO  SHOULD 
C  BE  RUNCHED  IN  THE  FOLLOMING  MANNERl 
C  COL.IM  RARAMETER  ESTIMATE  FOR  SROURt  ITEM), 
c  coL.f  blank 

C  COL.**t  RARAMETER  ESTIMATE  FOR  OROURI  STEMS. 

C  COL. 10  BLANK 

C  CONTINUE  IN  THIS  MANNER  FOR  EACH  OF  THE  ESTINATES  IN  FSIUTHER 
e  SESSNNXNO  on  a  NEN  CARO  REREAT  THIS  RROCEOURC  FOR  RSIS. 

c 

c  the  next  collection  of  CAROS  IN  THE  DATA  CAROS*  MHERE  EACH  SHOULO 
e  SE  RUNCHEO  IN  THE  FOLLOMING  MANNERl 

C  COL.t-OEdF  NECESSARY)  THE  OICHOTOMOUSLY  SCORED(0*t)  RE8R0NSE 
e  VECTOR 

c 

DIMENSION  XSCOR(200*00)fR0nS.S0)*DISCOR(B00)*DX8MT(S0.2)* 
t  IDENT(IO) 

C 

RBADdfSt)  (lOENTiM).  Mil. SO) 
tl  FORMAT (SOAR) 

MRITE(S.ll)  (lOENT(M).  Mai. SO) 

II  FORHATC  t.tOAA.K/)) 

C 

c  reading  data  raraneters. 

READ(S.IS)  NN.II 

IIS  FORMATCtlS) 

C  reading  RGI  RARAMETER  EGTIMATES. 

00  G  ITll*l 

REAO(t*lA)  (RGI(IT.I).  Sal.ll) 
li  FORNATCIGIFG.l.tX)) 

G  CONTINUE 


A-l 


c  READING  SUBJECTS'  VECTOR  SCORES. 

DO  9  JalfNN 

REA0(S.l5)  (I8COR(J*K)» 

IS  RORMATCSOin 
9  CONTINUE 
C 

C  CLEARING  DISCRIMINANT  ARRAYS  TO  ZERO. 

DO  10  INai.NN 
OI8COR(IN)aO.O 
to  CONTINUE 
C 

C  COMPUTE  ITEM  discriminant  COEFFICIENTS. 

DO  20  191. II 

0ISNT(I.l)aAL0C(P8I(2.I)/P8I(l.l)) 
OI8NT(If2)aAL06((t,0«P8I(2«I))/(t.O-P8l(l.I})) 
20  CONTINUE 
C 

C  PRINT  discriminant  COEFFICIENTS. 

WRXTEtSfU) 

19  FORMATCaX.iITEM  WEIGHTS <» 2(/) ) 

C 

00  30  ICtl.Il 

WRITE(6.17)  (OISWTCIC.JJ)*  JJb1«2) 

17  F0RMAT(1X«F7.3«2X.F7.3) 

SO  CONTINUE 

C 

C  COMPUTE  DISCRIMINANT  SCORE. 

DO  40  Nal.NN 
DO  SO  Ill. II 

090I8NT(X«l)«ISC0R(N.nf 
1  0ISWT(Ii2}«(l«I8e0R(NfI)) 

Dl8C0R(N)a0ISC0R(N)40 
50  CONTINUE 

40  CONTINUE 
C 

C  PRINT  discriminant  SCORES. 

WRXTEIS.IS) 

18  F0RMAT(2(/}.'  discriminant  SCORCSi .2(/) ) 

C 

WRXTI(6.19)  (KK.DISCORCKK).  KKal.NN) 

19  F0RMAT(1X.11,5X.F8.3) 

STOP 

END 


NUMBER-CORRECT: 
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APPENDIX  B 

CONTINGENCY  TABLES  AND  CROSS-VALIDATION  DATA 


B-0 


NUMBER-CORRECT:  CONTINGENCY  TABLES  AND  CROSS-VALIDATION  DATA 


Module  1  Criterion  =  7/8  %  Agreement  =88  =  26.09,  £<.01 

Diagnosed  Group  Membership 


Pre 

Post 

Actual  Group 

Pre 

21 

4 

Membership 

Post 

2 

23 

2 

M  dule  2  Criterion  =  3/6  %  Agreement  =78  X  =  16.77,  £<.0i 

Diagnosed  Group  Membership 


Pre 

Post 

Actual  Group 

Pre 

14 

11 

Membership 

Post 

0 

25 

Module  3  Criterion  =  3/5  %  Agreement  =68  =  5.12,  £<.01 

Diagnosed  Group  Membership 


Pre 

Post 

Acutal  Group 

Pre 

17 

8 

Membership 

Post 

8 

17 

B-1 


Module  4  Criterion  =  11/17  %  Agreement  =86  =  23.16,  £<.01 

Diagnosed  Group  Membership 


Pre 

Post 

Actual  Group 

Pre 

21 

4 

Membership 

Post 

3 

22 

Module  5  Criterion  =  4/8 

%  Agreement  • 

=  74 

> 

=  9.96,  £<.01 

Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Pre 

19 

6 

Membership 

Post 

7 

18 

Module  6  Criterion  =  9/12 

%  Agreement  = 

92 

=  32.31,  £<.01 

Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Pre 

24 

1 

Membership 

Post 

3 

22 

Module  7  Criterion  =  7/10 

%  Agreement  = 

86 

=  23.16,  £<.01 

Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Pre 

21 

4 

Membership 

Post 

3 

22 

B-2 


Module  8  Criterion  =  2/4 


%  Agreement  =  72 


=  8.21,  £<.01 


Oiagonosed  Group  Membership 


Pre 

Post 

Actual  Group 

Pre 

16 

9 

Membership 

Post 

20 

Module  9  Criterion  =  5/7 

%  Agreement  =  78 

X‘ 

■  =  13.54,  £<.01 

Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Pre  19 

6 

Membership 

Post  5 

20 

Module  10  Criterion  =  5/7 

%  Agreement  =  80 

^  =  16.64,  £<.01 

Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Pre  17 

8 

Membership 

Post  2 

23 

Module  11  Criterion  =  4/6 

%  Agreement  =  88 

^  =  26.60,  £<.01 

Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Pre  20 

5 

Membership 

Post  1 

24 
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Module  12  Criterion  =  6/8 


%  Agreement  =  88 


=  26.09,  £<.01 


Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Pre  21 

4 

Membership 

Post  2 

23 

Module  13  Criterion  =  3/4 

%  Agreement  =  86 

X^  =  24.08, 

.  £<.01 

Diagnosed  Group 

Membership 

Pre 

Post 

Actual  Group 

Pre  19 

6 

Membership 

Post  1 

24 

Module  14  Criterion  =  4/6 

%  Agreement  =90  X 

^  =  29.30, 

£<.01 

Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Pre  21 

4 

Membership 

Post  1 

24 
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DISCRIMINANT: 
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APPENDIX  C 

CONTINGENCY  TABLES  AND  CROSS-VALIDATION  DATA 


C-0 


DISCRIMINANT:  CONTINGENCY  TABLES  AND  CROSS-VALIDATION  DATA 


Module  1  Criterion  =0  %  Agreement  =88  )^  =  26.09,  £<.01 

Diagnosed  Group  Membership 


Actual  Group 

Membership 

Pre 

Post 

Pre 

21 

Post 

4 

2 

23 

Module  2  Criterion  =  0 

%  Agreement  =  86 

=  23.46,  £<.01 

Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Post 

20 

5 

Membership 

Post 

2 

23 

Module  3  Criterion  =  0 

%  Agreement  =  82 

=  18.75,  £<.01 

Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Pre 

23 

2 

Membership 

Post 

7 

18 

Module  5  Criterion  =  0 

*  Agreement  =  78 

x' 

=  15.91,  £<.01 

Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Pre 

21 

4 

Membership 

Post 

7 

18 

C-1 


Module  7  Criterion  »  0  %  Agreement  =88  =  26.09,  £<.01 

Diagnosed  Group  Membership 


Pre 

Post 

Actual  Group 

Pre 

21 

4 

Membership 

Post 

2 

23 

Module  8  Criterion  =  0 

%  Agreement  =  78 

II 

1 

x| 

13.54,  2.  ^*01 

Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Pre 

19 

6 

Membership 

Post 

5 

20 

Module  9  Criterion  =  0 

%  Agreement  =  80 

1 

x| 

=  15.78,  £^  <.01 

Diagnosed  Group  Membership 

Pre 

! 

Post 

Actual  Group 

Pre 

19 

1 

6 

Membership 

Post 

4 

21 

Module  10  Criterion  =  0 

%  Agreement  =  78 

x' 

=  14.08,  s_  <.01 

Diagnosed  Group 

Membership 

Pre 

Post 

Actual  Group 

Pre 

17 

! 

8 

Membership 

Post 

3 

22 

C-2 


Module  11  Criterion  =0  %  Agreement  =88  =  26.6,  £<.01 

Diagnosed  Group  Membership 


Pre 

Post 

Actual  Group 

Pre 

20 

5 

hciiibership 

Post 

1 

24 

Module  12  Criterion  =  0 

%  Agreement  =  90 

=  29.30,  £<.01 

Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Pre  21 

4 

Membership 

Post  1 

24 

Module  13  Criterion  =0  %  Agreement  =86  )[^  =  24.08,  p<.01 


Diagnosed  Group  Membership 


Pre 

Post 

Actual  Group 

Pre 

19 

6 

Membership 

Post 

1 

24 

Module  14  Criterion  =  0 

%  Agreement  = 

92 

=  32.00, 

Diagnosed  Group  Membership 

Pre 

Post 

Actual  Group 

Pre 

23 

2 

Membership 

Post 

2 

23 

DATE 
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